A new correlation belief function in Dempster-Shafer evidence theory and its application in classification

Uncertain information processing is a key problem in classification. Dempster-Shafer evidence theory (D-S evidence theory) is widely used in uncertain information modelling and fusion. For uncertain information fusion, the Dempster’s combination rule in D-S evidence theory has limitation in some cases that it may cause counterintuitive fusion results. In this paper, a new correlation belief function is proposed to address this problem. The proposed method transfers the belief from a certain proposition to other related propositions to avoid the loss of information while doing information fusion, which can effectively solve the problem of conflict management in D-S evidence theory. The experimental results of classification on the UCI dataset show that the proposed method not only assigns a higher belief to the correct propositions than other methods, but also expresses the conflict among the data apparently. The robustness and superiority of the proposed method in classification are verified through experiments on different datasets with varying proportion of training set.

www.nature.com/scientificreports/ set and form the noise detection training data. This work adopts multi-source information fusion technology [16][17][18][19] for uncertain information processing in classification. Information fusion technology has been greatly developed and applied in practical applications such as decision-making [20][21][22][23] , pattern recognition 24,25 , fault diagnosis [26][27][28] , risk analysis 29,30 , and reliability assessment [31][32][33] . Many mathematical methods are adopted for information fusion, such as Dempster-Shafer evidence theory (D-S evidence theory) 34,35 , belief function theory 36,37 , fuzzy set theory 38 , probability theory 39 , D-numbers 40 , Z-numbers 41 , generalized evidence theory 42,43 , and so on 44,45 . As a widely used theory in information fusion, D-S evidence theory is an effective method for modeling and fusing uncertain information in many fields such as clustering 46,47 , classification [48][49][50] , fault diagnosis 51,52 , decision support system 53 , reliability analysis 54 , correlation analysis 19,55 , multi-attribute decision analysis 56 , and so on 57,58 . Nevertheless, there are still some open issues to be addressed including the computational complexity of Dempster's combination rule 59 and the uncertainty measurement in the evidence theory [60][61][62][63] . Uncertainty measurement in D-S evidence theory is an important step to deal with potential conflict information fusion. To address Uncertainty management in the evidence theory, Gao et al. 64 propose a new uncertainty measurement based on Tsallis entropy. Based on the belief intervals of D-number, Deng and Jiang 65 propose a total uncertainty measurement that comprises several basic properties including the range, monotonicity, and generalized set consistency. Deng and Wang 66 measure the Hellinger distance between the belief interval and the most uncertain interval for each single case as the total uncertainty. Besides, for the existing methods of uncertainty measurement, Moral-García and Abellán 67 pointed out that the maximum value of entropy on the belief interval is the most suitable way of measurement for practical applications because of its excellent mathematical properties. This work focuses on information fusion in classification problem with respect to uncertainty management with a new correlation factor in the framework of D-S evidence theory.
There are many works proposing new classification methods based on D-S evidence theory or belief functions. Geng et al. 68 combine evidence association rule with classification and propose an evidence association rule-based classification method. Wang et al. 69 propose an ensemble classifier that uses the evidence theory to fuse the outputs of multiple classifiers. For classification problem with high-dimensional data, Su et al. 70 establish a rough evidential K-NN classification rule in the framework of rough set theory which selects features by minimizing the neighborhood pignistic decision error rate. To address the uncertainty caused by fuzzy data, Li et al. 71 propose a new framework to combine the results of multi-supervised classification and clustering based on belief function. With the popularity of deep learning, Tong et al. 72 propose the use of convolutional and pooling layers in convolutional neural networks to extract data features and then transform them into belief function. From the perspective of information fusion and uncertainty management in classification, in this paper, a novel correlation belief function is proposed to manage the uncertainty and improve the performance of D-S evidence theory in information fusion in classification.
The rest of this paper is organized as follows. Dempster-Shafer evidence theory is reviewed in section "Preliminaries". Section "The correlation belief function" introduces the correlation belief function with some numerical examples. Section "Application in classification" is the correlation belief function-based classification method and its application. Section "Discussion" discusses the robustness and superiority of the proposed method. Conclusions are given in section "Conclusions".

Preliminaries
Dempster-Shafer evidence theory. Definition 1 Define as a nonempty set of n exhaustive and mutually exclusive elements. is called the frame of discernment (FOD).
The power set of is composed of 2 n propositions, which can be denoted as follows: Definition 2 For , a basic belief assignment (BBA), which is also called mass function, is a mapping m : 2 → [0, 1] . m satisfies: A is called a focal element if m(A) > 0 . m(A) indicates the degree to which evidence supports proposition A.

Definition 3
In D-S evidence theory, two independent pieces of evidence can be fused by Dempster's combination rule: where K represent the degree of conflict between m 1 and m 2 : Pignistic probability transformation. The Pignistic probability transformation in transferable belief model was first proposed by Smets 73 .

Definition 4
Assume m is a BBA on , both A and B are any set in the power set 2 . Its associated pignistic probability function BetP m : → [0, 1] is defined as follows: where m(∅) � = 1 , |A| is the cardinality of subset A.

The correlation belief function
In information fusion, it is important to take advantage of all available data. If some information is lost, a counterintuitive combination result may be got. For instance, assume the FOD is {θ 1 , θ 2 , θ 3 } , the power set of is 2 and the mass functions m 1 and m 2 are as follows.
where n is the number of elements in the FOD, 0 < p ≤ 1, 0 ≤ q i ≤ 1, n i=1 q i = 1, B 1 is any set in 2 except {θ 1 } and ∅, A i is any set in 2 and its subset does not contain {θ 1 } . No matter how the values of p and q i change, the combination result m 1 ⊕ m 2 ({θ 1 }) is always equal to 0. In other words, all information about the proposition {θ 1 } in m 1 is lost. To address this issue, the correlation belief function is proposed. The correlation belief function consists of two steps: belief gathering and correlation belief transfer. The flowchart of the correlation belief function is shown in Fig. 1. Belief gathering. In a closed world assumption, let be a set of n possible values that are mutually exclusive, � = {θ 1 , θ 2 , θ 3 , . . . , θ i , . . . , θ n } . The power set of is 2 , 2 � = {∅, {θ 1 }, {θ 2 }, . . . , {θ 1 ∪ θ 2 }, . . . , {θ 1 ∪ θ 2 ∪ θ 3 ∪ θ i }, . . . , �} . Single subset propositions in 2 are marked as α i (i = 1, 2, 3, . . . , n) , and multi-subset propositions in 2 are marked as β j (j = 1, 2, 3, . . . , 2 n − n − 1) . Assume m is the original BBA on and m * is the modified BBA by this step. In this step, single subset propositions ( α i ) are pignistic probability transformed in Eq. (6) and the belief value of multi-subset propositions ( β j ) are set as zero. The modified BBA ( m * ) of proposition α i and proposition β i are defined as follows: where α i is any single subset proposition ( |α i | = 1 ) in the power set 2 , A is any proposition in the power set (2 � ).
Correlation belief transfer. This step is the core of correlation belief function, which is called correlation belief transfer. It is defined in section "Definition" and a simple example is presented to clearly illustrate the process of this step in section "Illustrative example". Figure 2 visualizes this example, which is also an illustration of STEP 2 in Fig. 1.
Definition. Assume the FOD is � = {θ 1 , θ 2 , θ 3 , . . . , θ i , . . . , θ n } , the power set of is 2 . m is a BBA on and m * is the modified BBA by belief gathering in section "Belief gathering". In this step, the single subset proposition α i transferred its belief to the multi-subset propositions β j where α i ⊂ β j , and the result is called the transferred BBA marked as m * * . m t (α i → β j ) is defined as the transferred belief value from single subset proposition α i to multi-subset proposition β j . The transferred BBA m * * of single subset propositions α i and multi-subset proposition β j is defined as follows: Illustrative example. To better understand the process of correlation belief transfer, a simple illustrative example is presented. Assume FOD � = {θ 1 , θ 2 , θ 3 } , the power set of is 2 , Since the problem is discussed in a closed world assumption, ∅ is not taken into consideration. Suppose that the BBAs after belief gathering are given as follows: m * ({θ 1 }) = p 1 , m * ({θ 2 }) = p 2 , m * ({θ 3 }) = p 3 , where p 1 , p 2 , p 3 ∈ (0, 1) and 3 i=1 p i = 1 . For proposition {θ 1 } , its belief should be transferred to the propositions {θ 1 , θ 2 }, {θ 1 , θ 3 }, and {θ 1 , θ 2 , θ 3 } based on the proposed method. And the transferred belief value is as follows: Generate BBAs from data of information sources  The whole process of this example can be illustrated in Fig. 2.
Discussion of the correlation belief function. To summarize the above two steps of the correlation belief function, firstly, all the belief is put into the single subset propositions. The first step aims at gathering the belief for an easier decision-making and a convenient in transferring correlation belief. In the next step, the belief of single subset propositions is transferred to correlated multi-subset propositions. Note that in this assignment, the single subset proposition must be a subset of the multi-subset proposition. In other words, the intersection of the single subset proposition which supplies belief and the multi-subset proposition which receives belief is not empty set. The idea is that if the belief of proposition {θ 1 } is greater than 0, the belief of the proposition which contains {θ 1 } must also be greater than 0. For example, assuming that there are three opaque bags {θ 1 } , {θ 2 } , and {θ 3 } , now there is a ball in one of these three bags at random. If this ball is in bag {θ 1 } , now pack bag {θ 1 } and bag {θ 2 } in a larger bag {θ 1 , θ 2 } . If it is stated that this ball is in bag {θ 1 } , it is reasonable to assume that it is also in the larger bag {θ 1 , θ 2 } . That is to say, if m({θ 1 }) > 0 , m({θ 1 , θ 2 }) is also supposed to be greater than 0.
The most advantage of the correlation belief function is that it makes use of the source evidence information to eliminate the counterintuitive combination result. When the belief of some propositions is 0, there is often high www.nature.com/scientificreports/ conflict between the evidence, and the correlation belief function can address this issue well. If one of the data's attributes has a value of 0 due to the fault, this method can transfer the related attribute value to it. If the value of its related attribute is also equal to 0, it is rational to believe that the sensors do not receive the signal about this attribute, and the collected data is reliable and effective. The proposed method is consistent with people's intuition and greatly enhances the robustness of Dempster's combination rule. In brief, even if the data collected are not accurate enough in a complex environment, it will not have a decisive impact on the final combination result, especially while processing a large amount of data.
Numerical examples. Start with conflicting evidence fusion based on Dempster's combination rule.

Example 1
Define that the FOD is � = {θ 1 , θ 2 , θ 3 } and two BBA S are as follows: If using Dempster's combination rule to fuse the two BBA S directly, the result will be counterintuitive: Based on the proposed method in Eqs. (7)-(11), the modified BBA S are calculated as follows.
Step 1: Belief gathering: Step 2: Correlation belief transfer: m 2 is calculated in the same way. The modified BBA S are given in Table 1.
The combination result compared with Dempster's method is shown in Table 2. From Table 2, it can be seen that the result of the proposed method is more reasonable than using Dempster's combination rule directly.    www.nature.com/scientificreports/ As can be seen from the first piece of original evidence, m 1 ({θ 1 }) is 0.99, which means that proposition {θ 1 } has a very high probability of happening. But when the first piece of evidence is combined with the other one, the result shows that m(θ 1 ) is 0, which means proposition {θ 1 } is never going to happen. In other words, the first piece of evidence about proposition {θ 1 } is completely denied by the other one, thus losing its support on proposition {θ 1 } . The main reason for this unreasonable result is that in the other piece of evidece, all proposi- . If we modify the original evidence like: the fusion result is quite different: Therefore, the proposed method, which transfers the belief from single subset propositions to correlated multisubset propositions and maintains the support of the original belief as much as possible, is effective and reliable.
Example 2 Suppose that the FOD is � = {θ 1 , θ 2 } , two BBA S are given as follows: Use the proposed method to modify the original BBA S , and get the result by using Dempster's combination rule: In this example, two pieces of evidence are completely conflicting and the classical Dempster's combination rule cannot address this problem. However, by using the correlation belief function to modify the original BBA S , the result is satisfactory: proposition {θ 1 } and proposition {θ 2 } have equal belief, and proposition {θ 1 , θ 2 } is also given a tiny amount of belief. This example also embodies another crucial advantage of the correlation belief function that it can deal with completely conflicting evidence.

Example 3
Suppose that the FOD is � = {θ 1 , θ 2 , θ 3 } , the BBA s are given as follows: From this example, it can be seen that although the first piece of evidence believes that proposition {θ 3 } can never happen, the latter two pieces of evidence have high belief in proposition {θ 3 } . Thus, it's reasonable to believe that the proposition {θ 3 } is still possible. However, the result with classical Dempster's combination rule shows that m({θ 3 }) = 0 , which is illogical and counterintuitive. After modifying the BBA by the proposed method, the fusion result is: This result indicates that the belief of proposition {θ 3 } is higher than that of proposition {θ 1 } and proposition {θ 2 } , which is in line with real situation. Although the belief value of multi-subset propositions is increased, it is very small and the effect on decision-making is slight.

Example 4
Suppose that the FOD is � = {θ 1 , θ 2 , θ 3 } , the first piece of evidence and i-th piece of evidence are as follows 19 : The combination result of m 1 and m i is always consistent with m 1 . Since Dempster's combination rule satisfies association law, no matter how much evidence is added, the result is still consistent with m 1 , which means the subsequent evidence is invalid and the result is illogical. The correlation belief function can solve this problem effectively and the result is shown in Figs. 3 and 4.

Example 5
Suppose that the FOD is � = {θ 1 , θ 2 } , two BBA s are given as follows: According to the proposed method, the modified BBA s are as follows: www.nature.com/scientificreports/ As can be seen that the proposed method does not increase the belief of proposition {θ 2 } because the belief of proposition {θ 2 } in the original evidence is 0, instead, the belief of proposition {θ 1 , θ 2 } is increased to 0.167, which is more reasonable. The result by using Dempster's combination rule is as follows: The result shows that proposition {θ 1 } has a very high degree of belief. The proposition {θ 1 , θ 2 } is given a small degree of belief, and there is no belief in the proposition {θ 2 } . Compared with fusion result without evidence modification, the proposed method loses some belief in proposition {θ 1 } , but the value of the belief is tiny and it can avoid counterintuitive fusion result in conflict data fusion.    The results of Dempster's rule and proposed method are shown in Table 3.
The first piece of evidence strongly suggests that proposition {θ 1 } has a high belief of 0.9, and the proposition {θ 1 , θ 2 , θ 3 } also gives {theta 1 } a small belief value. The second piece of evidence argues that proposition {θ 3 } has high belief of 0.9, but there is no other proposition supporting proposition {θ 3 } (i.e., {θ 1 , θ 3 } = {θ 2 , θ 3 } = {θ 1 , θ 2 , θ 3 } = 0 ). Therefore, in the final combination result, the belief of proposition {θ 1 } is slightly higher than that of proposition {θ 3 } , which is opposite to the classical Dempster's method. In addition, the results of Murphy's method 74 and Abellán 75 method are similar to the proposed method, which further demonstrating the effectiveness of the new method.

Application in classification
In this section, the classification experiments with real data sets are performed to evaluate the effectiveness of the proposed method. The real data sets are from UCI Machine Learning Repository. The classification process is as follows. Firstly, 80% of the dataset is selected as the training set to generate triangular fuzzy number models. Secondly, the triangular fuzzy number models are used to generate BBA for the remaining 20% data. Then, the proposed correlation belief function, which is composed of belief gathering and correlation belief transfer, is used to modify the evidence, and the Dempster's combination rule is used to fuse the modified BBAs. Finally, the fused BBA is transformed as belief of single set based on pignistic probability transformation, and the single subset proposition with the highest belief is the classification category. Figure Table 4. The remaining 10 samples are considered as the test set to verify the effectiveness of the proposed method. In the following contents, a sample is used to illustrate the process of data fusion and the complete classification result of the Iris data set will be shown in Table 7.
Firstly, one test sample from Setosa(θ 1 ) is randomly selected and its BBA generated from triangular fuzzy number model is shown in Table 5. Next, the BBA is modified by correlation belief function and the result is shown in Table 6. Finally, the complete classification result of the test set is shown in Table 7. Table 5 exhibits that the BBA entail certain conflict information. Specifically, in the evidence derived from the SL and SW attributes, the belief value is distributed evenly across propositions {θ 1 } , {θ 2 } , and {θ 3 } , rendering it arduous for the decision maker to reach a cogent judgment based on these two pieces of evidence. Conversely, in the evidence generated by the PL and PW attributes, the proposition {θ 1 } has a higher level of belief degree, while the proposition {θ 3 } is deemed entirely untrustworthy, creating a contradiction with the SL and SW evidence. Consequently, an optimal combination outcome ought to facilitate the decision maker's discernment while preserving the original conflict information, thereby aiding future policy formulation.
As can be seen from Table 6, all results believe that the sample belongs to class θ 1 , which is in line with the actual situation. Although Dempster's combination rule has the highest belief for proposition {θ 1 } , it's illogical that it has the belief value of 0 for the proposition {θ 3 } . Yager's method has no big belief value and is not conducive to decision-making, because the result indicates that there may be other proposition besides propositions {θ 1 } , {θ 2 } and {θ 3 } . This method has betrayed its original purpose for indicating the degree of belief in certain propositions. Wang et al's method has a right result in conflict management. However, the most disadvantage is that the unimportant propositions are also given a big belief value. The proposed method has a more satisfactory result. The proposition {θ 3 } is still considered possible. Although the belief value on m({θ 3 }) is low, the sense is significant that conflict information should not be ignored directly. Compared with the Wang et al. 's method, the proposed method maintains a higher degree of indicating the potential and right target.
The advantage of the proposed method can also be seen from Table 7. Only two samples are not classified correctly, and the total classification accuracy can reach 93.33%. Besides, in most cases of correctly classified, the maximum value of BBA is significantly higher than the other two classes. For example, in classification of Setosa(θ 1 ), the value of m * * ({θ 1 }) is much larger than m * * ({θ 2 }) and m * * ({θ 3 }) . Above all, the correlation belief function addresses the issue of conflicting data fusion rightly and properly.
Wine data set classification. In this experiment, Wine data set is used to further verify the effectiveness of the proposed method. The Wine data set contains 3 different varieties of wine and each has 13 attributes. In Wine data set, there are 59 samples in class θ 1 , 71 samples in class θ 2 , and 48 samples in class θ 3 .
As with the Iris experiments, a test sample is chosen to demonstrate the effectiveness of the proposed method. Besides, to further test the performance of the proposed method on classification problem, the cross-validation method in machine learning is introduced to divide the dataset. For the Wine dataset, 10-times-5-fold crossvalidation method is adopted and the its process is as follows.
1. Randomly shuffle the dataset. Divide the Wine data set D into five mutually exclusive subsets ( D i , i = 1, 2, . . . , 5 ) of the same size. In brief,

Gather the belief into single subset propositions in BBAs
Transfer the beliefs to correlated multi-subset propositions

Fused modified BBAs by Dempster's combination rule and the result is subjected to Pignistic Probability Transformation
Chose the single subset proposition with the highest belief as the classification category      www.nature.com/scientificreports/ 2. Take one of the subsets ( D i , i = 1, . . . , 5 ) as the test set and the other four subsets as the training set, and then calculate the classification accuracy. Repeat this step five times, each time with a different test set. In other words, D 1 is selected as the test set for the first time, D 2 − D 5 are selected as the training set. For the second time, D 2 is selected as the test set, D 1 , D 3 − D 5 are selected as the training set · · · D 5 is selected as the test set for the fifth time, D 1 − D 4 are selected as the training set. 3. Repeat the previous step ten times to obtain the average classification accuracy number.
The purpose of K-fold cross-validation is to make full use of the available data, avoid errors caused by randomness, and make the evaluation results as close as possible to the generalization ability of the model.
Firstly, the original BBAs generated by triangular fuzzy number are shown in Table 8 and they are visually displayed in Fig. 6. Next, the result modified by the proposed method is compared with other methods and shown in Table 9. Finally, the classification accuracy resulting from 10-times-5-fold cross-validation is shown in Table 10.
Similarly, analogous to the BBAs in the Iris experiment, diverse pieces of evidence in the real-world often have disparate classification perspectives, thus, significantly compromising the accuracy of individuals' sample category judgments. For instance, as illustrated in Table 8, the evidence produced by the Hue and Proline attributes neglects the notion and the sample should be assigned to category θ 1 , while the Malic acid and Alkalinity of ash features hold a different viewpoint, that is, proposition {θ 1 } has a higher belief value. In addition, although all evidence provides belief for proposition {θ 2 } , there are also significant differences in its values. To remedy these conflicts, the proposed methodology provides a plausible elucidation for the conflicting evidence while preserving the conflicting information of the original evidence.
As shown in Table 9, the result given by only using Dempster's combination rule is too absolute and hard. It believes that the proposition {θ 2 } has 100% belief degree, while the belief of other propositions is 0, which is illogical because Fig. 6 indicates that proposition {θ 1 } also has certain belief value. In addition, Dempster's method is less robust since if one piece of evidence is wrong, the conflict coefficient is likely to be 1, and the process of data fusion cannot be carried out. Yager's method also has the disadvantage of Dempster's method that it only reduces the belief value proportionally. And due to the multiple combinations, the belief of some propositions is too tiny to provide useful information, which means that most information is lost in the fusion process. Deng et al. 's method works well in this experiment. However, the main problem is that it assigns a large Table 8. BBA generated by using Wine data set.   www.nature.com/scientificreports/ amount of belief to unimportant propositions, which reduces the belief value of important propositions and is unfavorable in decision-making. When the data collected from a few sensors are in conflict, it is more likely that some sensors are not accurate. However, if many sensors indicate that there is conflict in data, it's reasonable to believe that some unusual conditions do exist. The proposed method solves this issue well. Firstly, it does not lose information in representing the main propositions, and the information is expressed through the single subset propositions as much as possible for an easier decision-making process. Secondly, when dealing with conflicting information, this method also takes it into consideration and expresses it in the combination result. The most important thing is that this method uses all the information when fusing data, and it is in line with people's cognition. The combination result in Table 8 shows the superiority of the proposed method. Compared with the other methods, the proposed method gives certain belief to the proposition {θ 1 } and {θ 3 } respectively to avoid conflict information. Meanwhile, it does not reduce the belief degree of the proposition {θ 2 } significantly, which is of great help in decision-making. Furthermore, it can be seen from Table 10 that the average accuracy of classification can reach 86.25%, and the accuracy fluctuates up and down at 86% in each case, with the lowest of 85.38% and the highest of 87.14%. The result indicates the effectiveness and stability of the proposed method.

Metrics of classification results.
To further evaluate the performance of the proposed method in the classification problem, different metrics in machine learning are adopted to measure the classification results.
The metrics are listed as follows.
-Precision: The ability of a classification model to identify only the relevant data points.
-Recall: The ability of a model to find all the relevant cases within a data set.
-Accuracy: Proportion of data correctly judged by the model in the total data.
-F1-score: The harmonic mean of precision and recall.  www.nature.com/scientificreports/ True positive (TP) means that the prediction is correct and the real value is positive. False positive (FP) means that the prediction is incorrect and the real value is negative. True negative (TN) means that the prediction is correct and the real value is negative. False negative (FN) means that the prediction is incorrect and the real value is positive. Based on the aforementioned metrics, the Macro-averaging (Macro-avg, the arithmetic average) parameters ( Macro_P , Macro_R , Macro_F 1 ) and Weighted-averaging (Weighted-avg, the weighted average) parameters ( Weighted_P , Weighted_R , Weighted_F 1 ) for indicators of each category are denoted as follows.
where Precision i , Recall i and F 1i are the Precision, Recall and F 1 of the i-th category respectively, W i = number of the i-th category total number of data . The results for these metrics on the Iris data set and Wine data set are shown in Tables 11 and 12. Table 11 reports the results of the proposed method for different classification metrics on the Iris data set. It shows that the proposed method is good in all metrics of class θ 1 , and the recall rate is equal to the precision rate and reaches 93.33% for both macro-averaging metrics and weighted-averaging metrics. Therefore, for samples with balanced data and clear classification boundaries like the Iris data set, the proposed method can utilize the information and address the uncertainty well.
The proposed method also has unique advantages for samples with many attributes and imbalance data such as the Wine data set. As shown in Table 12, the accuracy rate increases as the number of samples in a category increases. The precision rate is also slightly higher than the recall rate in terms of the weighted-averaging metric, which means that the proposed method is more adaptable in scenarios where false negative samples should be avoided, such as spam blocking systems.

Discussion
Robustness of the proposed method in classification problem. In this section, the robustness of the proposed method in classification problems is discussed. If an algorithm performs well in the classification accuracy of the test set regardless of the large proportion of the training set or the small proportion of the training set, it indicates that the algorithm has strong robustness.
The Iris data set is selected to obtain the classification accuracy of the test set under different proportions of the training set by performing 10 times randomized leave-out method, and the result is shown in Table 13. Each column of the table represents the result of the n-th leave-out method, and each row of the table represents the result of the training set with different proportions. For a more visual presentation of the result, Fig. 7 visualizes Table 13, where the higher the accuracy, the higher the column and the more the color of the column is skewed   Fig. 7 shows that the classification accuracy increases as the proportion of the training set becomes larger, and the accuracy is mostly above 90%, which illustrates the strong robustness of the proposed method.
To further illustrate the effectiveness of the proposed method, the average classification accuracy for each proportion of the training set is calculated. The result compared with Wang's base belief function 77 is shown in Fig. 8. It can be seen that in most cases, the red solid line (the proposed method) is above the blue dotted line (Wang's method), which means that the proposed method has a better performance in classification problem. In addition, the trend of polyline indicates that the larger the proportion of the training set, the higher the classification accuracy. When the proportion exceeds 40%, the classification accuracy of the proposed method can reach 90.44%. However, when the proportion exceeds 80%, the classification accuracy is difficult to be greatly improved. The results of this experiment accord with the practical application.
Comparative analysis in classification problem. In this section, the proposed method will be compared with Abellán's method 75 , Jing and Tang's method 79 , Wang's method 77 , Murphy's method 74 , Yager's method 76 and Dempster's method 34 to further demonstrate the superiority of this method in classification problems. In the experiments, four datasets are adopted, namely the Iris data set, the Wine data set, the Seed data set and the Penguins dataset. The Iris and Wine data sets have been introduced in sections "Iris data set classification" and "Wine data set classification". For the Seeds data set, it comes from UCI Machine Learning Repository and consists of three classes: θ 1 , θ 2 , θ 3 , and each class contains 70 samples with 7 attributes. The Penguins data set is from Palmer Station Antarctica LTER and it also consists of three classes: θ 1 , θ 2 , θ 3 , while class θ 1 contains 151 samples, class θ 2 contains 123 samples and class θ 3 contains 68 samples. Each sample has 2 attributes.
For each data set, stratified sampling is adopted, 80% of the data is used as the training set, and the remaining 20% of the data is used as the test set. Results of the comparative experiment in each data set are show in Figs. 9,  www.nature.com/scientificreports/ 10, 11 and 12 respectively. Each color of the histogram represents a category, the length of the column represents its classification accuracy, and the value to the right of the dotted line marks the classification accuracy of the most effective method. As can be seen from Figs. 9, 10, 11 and 12, the proposed method can achieve the highest classification accuracy in all datasets. Although the proposed method does not necessarily achieve the highest accuracy in classifying samples of a certain class of a dataset, it still achieve the highest classification accuracy in a complete data set. For example, both Abellán's method and Yager's method outperform the proposed method in classifying class θ 1 of the Wine dataset, but for the complete Wine dataset classification, the proposed method can achieves the highest accuracy. The comparative analysis further demonstrate the superiority and stability of the proposed method.
The experimental results and comparative study show that different methods have their own advantages and disadvantages in conflict management. Yager's method 76 is unique and it modifies the combination rule and maintains the original excellent mathematical properties. However, it is still unreasonable to simply put the belief of the conflict into the unknown part, and from Tables 6 and 9 and the classification results, the result of this method is not very prominent. Murphy's method 74 averages the belief of all evidence and then fused them. Averaging is an effective method to solve the normalization problem in combination, but different pieces of evidence often have different weights and simply performing arithmetic averaging will lose the specificity of the evidence. Abellán et al. 's method 75 proposed a hybrid rule to calculate the maximum conflict between two sets of evidence and then combine it with averaging. Although it appears to perform well, the method must assume that the data source is completely reliable, which is often not guaranteed in real world. Wang et al. 's method 77 adds the base belief to all propositions so that the belief of each proposition is not zero when evidence is fused.   www.nature.com/scientificreports/ It solves the conflicting data fusion problem. However, since each proposition has the base belief value, it often introduces more uncertainty. Jing and Tang 79 modifies this method to some extent by adding base belief for only the single subset propositions and combining it with Bayesian probability, but still suffers from the same problem of 77 . The proposed method can effectively solve the conflicting data fusion problem and has a good performance in classification applications. Nevertheless, the method is still not completely confident in delineating clear classification boundaries in the classification of samples with multiple attributes and large data volumes, which is worth of further study. The correlation belief function can integrate propositions with a large probability of occurrence and provide decisions in complex and uncertain environment.

Conclusions
When conflicting evidence is fused by using the classical Dempster's combination rule, a counterintuitive result may be produced. To solve this problem, a new correlation belief function is proposed for conflict management in this paper. It first gathers all the belief in the single subset propositions, and then transfers the belief of the single subset propositions to the related multi-subset propositions. The proposed method has two main advantages. Firstly, it can fully utilize the acquired information and avoid obtaining counterintuitive results generated by the information loss; secondly, compared with other methods, the proposed method can better address the conflicting information among data in the fusion result. A series of numerical examples validate the effectiveness of the proposed method in conflict management problems. The correlation belief function-based classification method has a good performance in classification applications. In the robustness test, the method can obtain high accuracy even with a small number of sample of training set. For example, the classification accuracy can reach 84.67% even if the proportion of the training set is only 20%. In addition, different data sets are tested and the results showed that the proposed classification method has a higher classification accuracy compared to other methods.
The following work can focus on addressing the following open issues. First, the time complexity of the classical Dempster's combination rule is not satisfaction, which leads to a similar problem in the proposed method 59 . Second, this method can only be applied to the closed world assumption and the incomplete frame of discernment can be taken into consideration in the future 42 . Third, there is a broad research scope to apply the proposed correlation function to model uncertainty in other applications such as expert system 53 . Finally, the proposed method should be adopted to address more complex classification problems.